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INTRODUCTION 


Matrix multiplication is useful in applications such as 
graphics, numerical analysis, or high-speed control. The 
purpose of this application report is to illustrate matrix 
multiplication on two digital signal processors, the 
TMS32010 and TMS32020. 

Both the TMS32010 and TMS32020 can multiply any 
two matrices of size M XN and N xP. The programs for 
the TMS32010 and TMS32020, included in the appendices, 
can multiply large matrices and are only limited by the 
amount of internal data RAM available. Assuming a 200-ns 
cycle time, the TMS32010 and TMS32020 can calculate 
[1x3] x [3x3] in 5.4 microseconds. 

Before discussing the two versions of implementing a 
matrix multiplication algorithm, a brief review of matrix 
multiplication is presented along with three examples of 
graphics applications. 


MATRIX MULTIPLICATION 


The size of a matrix is defined by the number of rows 
and columns it contains. For example, the following is a5 x3 
matrix since it contains five rows and three columns. 


ayy a12 a3 
a21 a22 a23 
A = a3] a32 433 
a4] a4? a43 
a5] a52 a53 


Any two matrices can be multiplied together as long 
as the second matrix has the same number of rows as the 
first has of columns. This condition is called conformability. 
For example, if a matrix A is an M XN matix and a matrix 
B is an N X P matrix, then the two can be multiplied together 
with the resulting matrix being of size M xP. 


3 4 4 36 
MXN = 2x2 NxP = 2x1 MxP = 2x1 


Example: (3)(4) + (4)(6) = 36 


Given the two conformable matrices A and B, the 
elements of C = A XB are given by: 


N 


= Lax x dy 
k=1 kj 


Q12 FORMAT 


Applications often require multiplication of mixed 
numbers. Since the TMS32010 and TMS32020 implement 
fixed-point arithmetic, the programs in the appendices assume 
a Q12 format, 1.e., 12 bits follow an assumed binary point. 
The bits to the right of the assumed binary point represent 
the fractional part of the number and the four bits to the left 
represent the integer part of the number. An example of Q12 
format is as follows: 


0001.110111100000 = 1.866 
ASSUMED BINARY POINT 


0000.110111100000 = 0.866 in Q12 
x 0000. 100000000000 = 0.5 in QI2 


00000000 .011011110000000000000000 = 0.433 in Q24 


The result of a Q12 by Q12 multiplication is a number 
in a Q24 format that can easily be converted to Q12 by a 
logical left-shift of four. The first four bits will be lost as 
well as the last twelve, but these bits are insignificant for 
Q12. Note that the programs in the appendices provide no 
protection against overflow; therefore, the design engineer 
should implement a format that best fits the application. 


GRAPHICS APPLICATIONS 


Operations in graphics applications, such as translation, 
scaling, or rotation, require matrix manipulations to be 
performed in a limited amount of time. Therefore, the 
TMS32010 and TMS32020 processors are ideal for these 
applications. Graphics applications, such as scaling and 
rotation of points in a coordinate system, require 
multiplication of matrices. Translation is_ typically 
implemented by addition of two matrices. However, when 
points are represented in a homogeneous coordinate system, 
translation can be implemented by multiplication. In a 
homogeneous coordinate system, a point P(x,y) is 
represented as P(X, Y,1). This type of coordinate system 1s 
desirable since it relates translation with scaling and rotation. 

Translation can be defined as the moving of a point 
or points in a coordinate system from one location to another 
without rotating. This is accomplished by adding a 
displacement value D, to the X coordinate of a point and 
adding a displacement value Dy to the Y coordinate, thus 
moving the point from one location to another. Figure 1 
shows both addition and multiplication methods of translation 
and an example of each. 

Similar to translation, scaling can be implemented by 
matrix multiplication. Points can be scaled by multiplying 


ADDITION METHOD 
[Xnew Ynew! = [Xo_p Yotp! + [D, Dy] 


where D, = 5 and Dy = 1 


MULTIPLICATION METHOD 


1 O 8) 
[Xnew Ynew 1] = [Xotp Yotp 1] ° | 0 1 0 
D, Dy 1 


where D, = 5 and Dy = 1 
Figure 1. Translation of Coordinates 


each coordinate of a point (or points) by a scaling value S, 
and Sy. Scaling an object is similar to stretching or shrinking 
an object. The coordinates of each point that makes up the 
object are multiplied by a scaling value which scales the 
object to a larger or smaller scale. Figure 2 shows the scaling 
of an object from one size to another. 


BEFORE SCALING 


x 


Rotation of the coordinates of a point (or points) about 
an angle theta can also be accomplished by a matrix 
multiplication. The following set of equations results with 
the matrix multiplication required to rotate an object about 
any angle. 


(Xnew, Ynew) 


(Xo_p, Youp) 


Xo_tp = r cos¢d 
Yotp = r sing 


= r cos (0+¢) = r cos¢d cosO — r sind sind 
r sin (6+) = rcosd sind + r sind cosO 


< x 
z2Q2 
mm 
== 
| 


XnNEW = XoLpD cosO — YoLp sinO 
XoLp sind — Yop cosO 


~< 
= 
m 
= 
i 


OR 
cosO sin6 O 


[Xnew Ynew 11 = [Xo_p Yotp 1] © |-sin@ cosO 0 
0 O 1 


AFTER SCALING 


Let the scaling factors S, and S, = 0.5 


[Xnew Ynew 1] = [Xotp Yotp 11 


[X Y 1] 


[4 4 1] 


[IX ¥ 1] = [2 2 1] 


y 
Xx 
S, 0 O 
*}/0 Ss, 0 
0 oOo 1 
05 O 0 
| 0 05 0 
0 oOo 1 


Figure 2. Scaling From One Size To Another 


Figure 3 shows an implementation of these equations 
to rotate an object 30 degrees about the origin. 

Figures 4 and 5 show a segment of straight-line 
TMS32010 and TMS32020 code, respectively. These 
programs calculate the coordinate rotation example using a 
Q12 format. Note that once the matrices are loaded into 
memory, the procssors can calculate the results in 5.4 
microseconds. The segment of TMS32020 code in Figure 5 
implements the MAC instruction. For small matrices, the 
MAC instruction in conjunction with the RPT instruction 
gains little due to the overhead timing of the MAC 


instruction. However, for larger matrices, this method is 


most efficient since the MAC instruction becomes single- 


Mie Poy 


Ooo 
Cyc 
erererss 
OCC, 
COC? 
OOoces 
OO? 
Om) 
OO dd 
OO 
OO To 
OOL4 
Cry T 
OLS 
Omi? 
OO 
(cad 
Coe 
OCS 1 
OO 
Oo 
Oow4 
OOS 
OCA 
OOn? 
OOS 
Oc 
OOM 
OCs 
OO 
COVA 
CC 4h 
Ose 
OCs 4, 
OO 7 


COICO 


CCC) | 
OOO 
OOCKS 
OO 4 
Cc 
COO As 
Cay 7 
OOS 
CUI 
MOOK 
OOOB 
OOO 
Qo 
QOOE 
OOOF 
OOO 
OO. 
OOS 
OOLS 
OOL4 
OOS 
OOS 
O17 
Oo Ls 
OO? 
OQLA 


eA 


ae CIC) 
OO: 
AO 
FOO 
YLOY 
40 A 
40 ANS 
40Aar 
AOS 
40ae 
4A ORS 
AOC 
4065 
408m 
405 
4005 
40 
7 ist? 
7OOO 
AAA L 
&0NAGQ 
EIA 
H& TAQ 
IAL 
&TAQ 
FFF 
pe We | He 


4501 


[IX ¥Y 1] = 


; [7 2 1] e 
cycle in the repeat mode. For applications that only require 0 0 1 


translation, scaling, or rotation of coordinates, straight-line 
code as in Figures 4 and 5 is more efficient than the larger 
programs in the appendices. 


[X Y 1] = [5.0 5.2 1] 


II 


Figure 3. Implementation of Rotation Matrix 
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Figure 4. TMS32010 Code for Rotation (Concluded) 
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Figure 5. TMS32020 Code for Rotation 


To combine translation, scaling, and rotation, a more | GENERAL MATRIX FOR 


general matrix can be implemented. THREE-DIMENSIONAL SYSTEMS 
GENERAL MATRIX FOR Tl] r12 r13 0 
TWO-DIMENSIONAL SYSTEMS 1] £99 173 0 
13] 132 133 0 
ri r12 0 i ty ty 1 
r2] r22 0 
tx ty 1 
IMPLEMENTATION OF THE MATRIX 
The upper 2 X 2 matrix is a combination rotation matrix MULTIPLICATION ALGORITHM 
and scaling matrix. The tx and ty values are the translation FOR THE TMS32010 
values. A three-dimensional general matrix can be developed 
similar to the two-dimensional translation, scaling, and The implementation of the algorithm for the TMS32010 
rotation matrix. shown in Figure 6 assumes that the two matrices to be 


multiplied together are of size M x N and N x P. Three major 


INCREMENT PCOUNT. LOAD POINTER1 
WITH BDIS (FIRST UNPROCESSED ROW). 
SET NCOUNT = 0. CLEAR ACCUMULATOR. 


INITIALIZATION 
INPUT M, N, AND P. 


CALCULATE SIZE OF MATRIX A 
AND B. AISM<xN, 
AND BIS N xP. 


INCREMENT NCOUNT. MULTIPLY VALUE 
POINTED AT BY POINTER1 WITH 
VALUE POINTED AT BY POINTER2 
AND ACCUMULATE. INCREMENT 

BOTH POINTERS. 


INPUT THE A MATRIX BY ROWS. STORE 
THESE VALUES IN MEMORY IMMEDIATELY 
AFTER THE INITIALIZATION VALUES. 


YES 
OUTPUT ANSWER 


INPUT THE B MATRIX BY COLUMNS. 
STORE THESE VALUES IN MEMORY 


FOLLOWING THE A MATRIX VALUES. <> 
YES 
MCOUNT = 0 


INCREMENT MCOUNT AND DEFINE 
THE BEGINNING OF THE FIRST 
UNPROCESSED ROW AS BDIS. SET 
POINTER2 POINTING AT THE 
BEGINNING OF THE B MATRIX. 
SET PCOUNT = 0. 


Figure 6. TMS32010 Flowchart 


INITIALIZATION 
INPUT M, N, AND P. 


READ B MATRIX 
INTO BLOCK B1. 


LAST ROW 
BEEN ENTERED 


YET? 


INPUT A ROW 
OF THE A MATRIX. 
CLEAR ACCUMULATOR 
AND P REGISTER. 


MULTIPLY THE ROW OF THE 
A MATRIX BY A COLUMN IN 
THE B MATRIX. 


OUTPUT RESULT. 


A 


Figure 7. TMS32020 Flowchart 


loops are included to multiply the two matrices. The outside 
loop control is labeled MCOUNT since it controls which row 
in the A matrix is being referenced during the multiplication. 
The secondary loop control is labeled PCOUNT because it 
counts how many columns in the B matrix have been 
processed. The inside loop control is labeled NCOUNT since 
it controls the multiplication of the values in the A matrix 
with the values in the B matrix. 


IMPLEMENTATION OF THE MATRIX 
MULTIPLICATION ALGORITHM 
FOR THE TMS32020 


The implementation of the algorithm for the TMS32020 
is somewhat different since its advanced instruction set allows 
for a more efficient method of computing matrix 
multiplication. The TMS32020 version in Figure 7 also 
assumes that the two matrices to be multiplied are of size 
M XN and N xP. This program takes a row of the A matrix, 


loads it into block BO of data memory, and then multiplies 
this row by all columns in the B matrix. The TMS32020 
continues this process until all the rows in the A matrix have 
been multiplied by all the columns in the B matrix. The 
TMS32020 version is similar to the TMS32010 in that the 
A matrix must be entered by rows and the B matrix by 
columns. This allows for a faster execution time. Figure 7 
shows the basic implementation of the matrix multiplication 
algorithm that the TMS32020 uses to multiply two matrices. 

Since the programs in the appendices treat the matrices 
differently, a memory map is included to help in 
understanding the two versions. Figure 8 shows how the 
matrices should look in memory after they have been entered. 
Note that for the TMS32020 version, the A matrix values 
reside in program memory since the CNFP (configure as 
program memory) instruction was implemented. Note also 
that only one row of the A matrix is in this block since the 
program enters one row at a time. 


For the following matrices, 


A =| ll aj2 
a2] a22 


bi bi2 b13 
b2] b22 b23 


the memory would be configured in this manner for the TMS32010 and TMS32020. 


TMS32020 
DATA MEMORY PROGRAM MEMORY 
LOCATION VALUE LOCATION VALUE 
(IN HEX) (IN HEX) 
> 308 b, 1 >FFOO aj1 
>309 bo4 >FFO1 aj2 
>30A by 2 
>30B bog 
>30C b13 
>30D b93 


Figure 8. Memory Maps 


TMS32010 
DATA MEMORY 
LOCATION VALUE 
(IN HEX) 

> OOF a14 
>010 a12 
>011 a7j 
>012 a22 
>013 b44 
>014 b54 
>015 b12 
>016 bo9 
>017 b413 
>018 b53 

SUMMARY . 


The TMS32010 and TMS32020 processors can be used 
to multiply large matrices efficiently. A brief review of 
matrix multiplication has been given to assist in the 
understanding of fundamental matrix multiplication. Three 
examples of graphics applications have been presented since 
these applications often require multiplication of matrices. 

The TMS320 family has the power and flexibility to 
cost-effectively implement a wide range of high-speed 
graphics, numerical analysis, digital signal processing, and 


control applications. Since the TMS32010 and TMS32020 
combine the flexibility of a high-speed controller with the 
numerical capability of an array processor, a new approach 
to applications such as graphics can now be considered. 
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INTO DATA RAM. 


my 
NE 
may 
IN # PAO 
+ 
BOTs 
Coy 
SNOT 


+ 
*%# MORE INITIALIZATION 
+ 
LAc - 
SLB N 
SARL C 
LAL T 
ADT A 
SACL 7 
SIR N 
ACL. 


TALCULATE A x B 


OUTPUT (ij) 


it 
~ 


ACik) = BCky) 


Tl & oe oe OK me 
a“ 


5 LAC 
ADL ON 


OLE3 GOS1 Soos SACL m4 
O124 0032 6881 LARF 1 


OLS GOA SoG48 SAL co 
SOO SN CAC ro 


1 


OLE? oon7 Seat 
OLA0 QOS8 OOOE ADT ie 
OLS ONY Bagg SACL os 
GQlse QOSQ aos LAR ARO 1 
Oss OOSB SFE? ZAC 

QV24 OoO8m BoOO4 SACL ANS 
OLS OO8T Fos ACL. 3 
OPs4 OOS BOG TH LAr rs 


OLS7 OOSF DOGE ant MINE 
OLS Goo40 &oo0e8 “ALL os 
Quy OO4L 6304, Z ALE ANS 
M140 O042 64A41 La #+ AR1 
O41 ood: 4nao MEY’ H+ ARO 
O42 OO44 RSF AFA 
Olas ood Bas SAICH BNI 
M144 0044 2Oos LAC C 
O145 0047 100] SLIB N 
144 0046 FEOO RN? TH 
OO4a? OOSE 


ar 
tf 
t ws] 
a 


LOAD ACCUMULATOR WITH HIGH WORD OF m4 RESULT. 
LEFT-SHIFT FOUR TO CONVERT To Gis, 
NOTE THAT ONLY THE 12 MSB°5S ARE SIGNIFICANT. 


O14 
OLS0 
O31 
OLS2 0048 2404 LAL ANS, 4 
O153 0048 2004 SACL ANS 
O54 0040 4504 MIT ANS FAQ 
O155 o04p food LAL nz 
O13464 GO4E 1002 SLB e 
OLS7 OO4F FEOO BNZ IN 

OOO OO37 


a ok Kote 


O158 OooO81 Poor LAr 4 

OLS? OO82 1007 herd ML ANTS 

NO140 OO 3 FEOO BENZ Fs 
OO84 OOSF 

QO141 OO55 FrOO OUITT K WIT T 


OOSe4 OOS 


NO ERRORS, NO WARNINGS 


Mle TAT 


MOC t 
OOS 
COCs 
CC 4h 
OO 
OOO A 
COC? 
COCCI 
OCs 
OIL 
QOOLI 
OOS 
OO 13 
Oo 14 
OOS 
CMO 1 bs 
QOL? 
OOE 
OO 


OOO 


OG tT 


yan 
CHAO 
Crd | 
mia 
Cola eb os 
Cia dp eb 
OO as 
Mae 
Cy 4? 
Oma 


Og4e 


COE CY 


OOs2 
1 ben Bees 


OoO8 4 


o we” “eee 


NOS S 


- FM ape! 


OF & 


OOO 


MOD 


oe ea 


a es 


OO 
QOS 
Ore 4 


OOS 


Me atte ais oe 
Catt te 
'e0* ehaen “see 


OO? 


O23 
COE 
DOA 
OO 
18 bra me 
oi ee 


AR 


at ae? awe? tae 


OO 
MCs 7 


OOS 


QCM 
OOsA 
OOMB 
OO 


OO 
QOOSE 


me OI) 


OOO 
OOO] 
OO? 
OOS 
OOO 
OO tS 
MOC, 
Oo? 


Cec, 
Oiacs 
OAC) 
hel et Dac 
MAC I 


CC 


eres seh de 
I tae eo 
woe” thee’ Sua? abeee 


SOC 


SOOT 
Cacacyes 
fer | 
a eleLe) 
Toor 


AMICI 


fel DH) Sf 
OOS 
JOOS 


“oe ee 
a er 


NOLO} 
CusCis 
4Bo4 
OAC 
ZOO TL 
1oos) 
600 1 
FeO 


SL bea 


Appendix B 


FAMILY MACRO ASSEMBLER RYT, Fo bi4b. sas 


HEHEHE EMER EES EERE EERE RS GREE EERE RHEE 

+ ALL INFOTS AND QUTRUTS FOR THIS PROGRAM + 

# SHMIWILIO BE OR ARE IN tiie FORMAT EXCEPT 

sas BOR THE MM, oN, AND F, WHICH SHOLILD BE mo, # 

HERR RDEREHEEREPRERHRRHRRHERRRERRHEEE 
ARIS cha 

MI ala Od we) 

NM Ei “eI 

e Et ees 


ANS nt ae 


RIM 1. 
PINE 
NM. dh 
1. rt og 


nA 


% INITIAL TL ZAT DON 


mere She 2 
E- IJ Lt we “4, 
are 


LCE bs 
LARK AR, BSOO 


LARP 1 
LAE 4d 
SAL CINE: 
% READ SIZES OF MATRICES 
RE TE: ae 
TN Hoek PAO 


% 


MORE INTTIALT ZATION 
Lat i 

AL CINE 
SACL hi 

LAL NJ 

mr MINE 

tt es BYP] ‘1 

bar NM 

MF j 

Fey 

er aS ht 

oot a BOM 1 
La - 

SB MINE 

one 1 Frit 
MATETI X., 


te 
# READ IN THE & 
t 


LALE 


RET BLM 1 
IN H+ FAO 
LA M 

LB MINE 
SAIL | 

RZ nT 


MALLER 


+ 


# TALL ROUTINE TO READ IN A ROW 


PAGE 


OO] 


B-1 


O87 


OOS 


Oy 
MOAT 


OCA 1 
OO 4s 
OO ATs 
CC Sp 
OO ES 
OO fd 
OOK? 
CA 
Oa 
OOya 
Oo? 1 


OOF 


Cer os 
O07 4 
OO7 8 
Oo & 
Oe? 
COTE 
Cyc yy 
OORG 


Mc] 


OCH A 


COue 7 


Cil eds? 


CHV 


NO ER! 


OOF 
OO4d 
OO4 1 
OO 4S 
OO4ss 
Cyd 4h 


OO Ay 
OOaAA, 


CALF 
Ci 4 
CiCia? 


Oar 


mich 
cs 4 
OO4e 


QO4E 
CME 


a ate 
COs | 
OO 
OORT 
OO 4 
TS ban ban 
Cro As 
Ooe? 
OFS 
Ore hake 
IEA 


FEO 
OES 


Lao 


ree ee, 
Cielo 


paces 
a a Pee 


tt 


=O? 


AAC 


AO 


Abad, 
SWAG 
FOC 


+ 
5 


tee 


Fmee® teooe abe Sree 


6a Ie ae 
7 feos of Pi 


Mca, 


ae tied 


100 
O20 
4S 
SOe 
EOS 
ata ba 
Nit 


& OF THE A MATRIX. 


| 

LAr ARO, PM 4 
tt 
% ITLEAR ALRCUMULATOR ANE F REGISTER. 
Misi. MR YE, oO 


7A 


Ae 


&® MULTIPLY A ROW BY A TOLLIMN, 


RET 


Mp 


MIM 1 
FE OO) + 


AAT 


ge: A Ie ee tg 


SOtTH ANS, 4 


oer) eee 


% THEIR haR ED OTR ALL COLMA HAVE BEEN FRC 


MLIL., 5 4 


sialic eet h ded tones 
Mik T NER T Fb, 


vee) Nees 


ere, gs 


TOLE 

NFO 

L_ARF 1 
LRLE AR. 


RET NMA 
IN 
CNEE 
RET 


WARN TNs 


Pe ied ot 
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